Data Sourcing

Data was sourced from Yahoo! Finance. We choose to focus on 27 stocks that have been in the Dow Jones Index for at least two years.

Data Cleaning

Create a dataframe of daily closing prices for each stock in the four most recent financial quarters.

# create a dataset with 27 stocks and 252 trading days 
# 27 stocks (rows) and 252 returns (columns/features/predictors)

companies.closings <- matrix(data = NA, nrow = length(companies), 
                             ncol = length(dataMMM$MMM.Close))

for (i in 1:length(companies.df)){
  companies.closings[i,] <- as.numeric(companies.df[[i]][,4]) # closings are on the 4th column
}

# change the names of the rows
rownames(companies.closings) <- companies

# take the transpose
# each row is a trading day with 29 different stock prices 
# each column is a stock
companies.closings.t <- t(companies.closings)


day <- c(1:nrow(companies.closings.t))

df = as.data.frame(cbind(day, companies.closings.t))



asset1 <- plot_ly(data = df, x = ~day, y = ~MMM, name = 'MMM', type = 'scatter', mode = 'lines', 
                 line = list(color = 'rgb(1, 1, 1)'))

for (i in 2:27){
  asset1 <- asset1 %>% add_trace(y = df[,i], name = companies[i], line = list(color = 'rgb(i, i, i)')) 
}

Asset 2: PCA Biplots

Using principal components analysis (PCA), we have reduced the dimension of the data into just two linear components, shown in the biplots below. We see that in 2019 (top), when the US economy was functioning normally, stocks tends to not correlate with each other–the vectors of each stock radiate in all directions. However, due to COVID-19, the stocks most, if not all, companies fell. This is reflected in the biplot for 2020 (bottom) since all the vectors of each company all point in the same general direction.

# half year cutoff
half = nrow(companies.closings.t)/2

pca_2019 <- prcomp(companies.closings.t[1 : half, ], scale = TRUE, center = TRUE)
pca_2020 <- prcomp(companies.closings.t[(half + 1) : (2*half), ], scale = TRUE, center = TRUE)

par(mfrow=c(2,1)) 
biplot(pca_2019, main = "2019 - Q3 & Q4")
biplot(pca_2020, main = "2020 - Q1 & Q2")

par(mfrow=c(1,1))